Skip to content

perf(docker): move version ARG below cached layers to fix cache invalidation#385

Merged
pimlock merged 2 commits intomainfrom
perf-docker-version-arg-cache/pm
Mar 17, 2026
Merged

perf(docker): move version ARG below cached layers to fix cache invalidation#385
pimlock merged 2 commits intomainfrom
perf-docker-version-arg-cache/pm

Conversation

@pimlock
Copy link
Collaborator

@pimlock pimlock commented Mar 17, 2026

Summary

Fix Docker layer cache invalidation caused by ARG OPENSHELL_CARGO_VERSION being declared near the top of each Dockerfile. Since the version includes a git commit hash (e.g. 0.0.7-dev.11+g085b131ae), it changes on every build and invalidates all downstream layers — including expensive dependency installs, toolchain setup, and Rust dependency pre-builds.

Related Issue

N/A — discovered via CI timing analysis.

Changes

  • Moved ARG OPENSHELL_CARGO_VERSION from the top of each builder stage to just before the RUN that uses it, in all 5 Dockerfiles:
    • Dockerfile.gateway
    • Dockerfile.cluster
    • Dockerfile.cli-macos
    • Dockerfile.python-wheels
    • Dockerfile.python-wheels-macos
  • Removed unused ARG OPENSHELL_IMAGE_TAG from Dockerfile.cli-macos, Dockerfile.python-wheels, and Dockerfile.python-wheels-macos

Context

2-week bisect of the build-gateway / Build gateway CI job showed two regressions:

Period Avg Duration Cause
Mar 6-7 ~2 min Baseline (before version ARG was introduced)
Mar 8-11 ~5.5 min +3.5m after 68525bb8 added ARG OPENSHELL_CARGO_VERSION at top of stage
Mar 12-16 ~8-9 min +2.5m from base image migration + Dockerfile refactors compounding the issue

Expected improvement: ~5-6 minutes recovered on gateway builds by preserving layer cache for dependency installation, toolchain setup, and the dependency pre-build step.

Testing

  • mise run pre-commit passes (two pre-existing failures unrelated to this change: port-8080-in-use integration test, missing license headers on 3 unrelated files)
  • Unit tests added/updated — N/A, Dockerfile-only change
  • E2E tests added/updated — N/A, will be validated by CI build times

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable) — N/A

…idation

The OPENSHELL_CARGO_VERSION build arg contains a git commit hash that
changes on every build (e.g. 0.0.7-dev.11+g085b131ae). Declaring this
ARG near the top of each Dockerfile invalidated every layer below it --
including expensive dependency installs, toolchain setup, and the Rust
dependency pre-build step -- on every single commit.

Move the ARG declaration to just before the RUN that actually uses it so
upstream layers stay cached. This recovers ~5-6 minutes per build on the
gateway image (from ~9m back toward ~2-3m) and similarly improves cluster,
CLI, and Python wheel builds.

Also removes unused OPENSHELL_IMAGE_TAG ARG from cli-macos, python-wheels,
and python-wheels-macos Dockerfiles.
@pimlock pimlock self-assigned this Mar 17, 2026
@pimlock pimlock added the test:e2e Requires end-to-end coverage label Mar 17, 2026
@pimlock pimlock requested review from drew and johntmyers and removed request for drew March 17, 2026 01:20
@pimlock pimlock merged commit 18fb7af into main Mar 17, 2026
11 of 12 checks passed
@pimlock pimlock deleted the perf-docker-version-arg-cache/pm branch March 17, 2026 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants